28 research outputs found

    Hyper-g Priors for Generalized Linear Models

    Full text link
    We develop an extension of the classical Zellner's g-prior to generalized linear models. The prior on the hyperparameter g is handled in a flexible way, so that any continuous proper hyperprior f(g) can be used, giving rise to a large class of hyper-g priors. Connections with the literature are described in detail. A fast and accurate integrated Laplace approximation of the marginal likelihood makes inference in large model spaces feasible. For posterior parameter estimation we propose an efficient and tuning-free Metropolis-Hastings sampler. The methodology is illustrated with variable selection and automatic covariate transformation in the Pima Indians diabetes data set.Comment: 30 pages, 12 figures, poster contribution at ISBA 201

    Approximate Bayesian Model Selection with the Deviance Statistic

    Full text link
    Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting Bayes factors between models. There is now a large literature on automatic and objective parameter priors in the linear model. One important class are gg-priors, which were recently extended from linear to generalized linear models (GLMs). We show that the resulting Bayes factors can be approximated by test-based Bayes factors (Johnson [Scand. J. Stat. 35 (2008) 354-368]) using the deviance statistics of the models. To estimate the hyperparameter gg, we propose empirical and fully Bayes approaches and link the former to minimum Bayes factors and shrinkage estimates from the literature. Furthermore, we describe how to approximate the corresponding posterior distribution of the regression coefficients based on the standard GLM output. We illustrate the approach with the development of a clinical prediction model for 30-day survival in the GUSTO-I trial using logistic regression.Comment: Published at http://dx.doi.org/10.1214/14-STS510 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian fractional polynomials

    Get PDF
    This paper sets out to implement the Bayesian paradigm for fractional polynomial models under the assumption of normally distributed error terms. Fractional polynomials widen the class of ordinary polynomials and offer an additive and transportable modelling approach. The methodology is based on a Bayesian linear model with a quasi-default hyper-g prior and combines variable selection with parametric modelling of additive effects. AMarkov chain Monte Carlo algorithm for the exploration of the model space is presented. This theoretically well-founded stochastic search constitutes a substantial improvement to ad hoc stepwise procedures for the fitting of fractional polynomial models. The method is applied to a data set on the relationship between ozone levels and meteorological parameters, previously analysed in the literatur

    Objective bayesian variable and function selection with hyper-g priors

    Full text link
    Die zwei grössten Herausforderungen der Bayesianischen Modellwahl sind die Spezifizierung von Priori-Verteilungen für die Parameter aller Modelle und die Berechnung der daraus resultierenden Posteriori-Wahrscheinlichkeiten der Modelle über die marginalen Likelihood-Werte. Mittlerweile gibt es eine breite Literatur zu automatischen und objektiven Priori-Verteilungen. Diese befreien den Statistiker von der manuellen Spezifizierung der Priori-Verteilungen für die Parameter, die schwierig ist wenn keine substantielle Priori-Information vorliegt. Ein wichtiger Vertreter ist die g-Priori von Zellner, die im linearen Modell aufgrund verschiedener günstiger Eigenschaften beliebt ist. Daraus entstehen stetige Mischungen von g-Priori-Verteilungen wenn man wiederum eine Priori-Verteilung für den Priori-Kovarianzmatrix-Faktor g annimmt. Diese sogenannten Hyper-g Priori-Verteilungen erübrigen die manuelle Wahl von g, das sehr einflussreich in der statistischen Analyse sein kann, und erhalten teilweise trotzdem eine geschlossene Form für die marginalen Likelihood-Werte. In einer früheren Arbeit benutzten wir fraktionelle Polynome (FP), die eine Erweiterung der klassischen Polynome sind, in Verbindung mit Hyper-g Priori-Verteilungen, um Kovariablen- und Funktions-Wahl in linearen Modellen zu betreiben. Für generalisierte lineare Modelle (GLM) ist eine Normalverteilung mit Null als Mittelwertsvektor und mit g multiplizierter inverser erwarteter Fisher-Informations-Matrix als Kovarianzmatrix der natürliche Kandidat für eine verallgemeinerte g-Priori. Die verallgemeinerte Hyper-g Priori-Verteilung beinhaltet zusätzlich eine Priori-Verteilung für g. Wir lösen das Hauptproblem, die Berechnung der marginalen Likelihood-Werte, mittels einer integrierten Laplace-Approximation. Diese erlaubt eine effiziente Erkundung des Modellraums mittels einer stochastischen Modell-Suche basierend auf Markov-Ketten Monte Carlo, da sie die gleichzeitige Ziehung von unterschiedlich dimensionierten Parametern der verschiedenen Modelle vermeidet. Nachdem vielversprechende Modelle gefunden wurden, können jeweils die Parameter mit Hilfe eines Metropolis-Hastings Verfahrens gezogen werden. Splines sind flexibler als FP und damit eine attraktive Alternative. Wir stellen sie als gemischte Modelle dar, wobei der nicht-lineare Anteil durch die zufälligen Effekte parametrisiert wird. Nachdem diese heraus integriert sind, können wir die Hyper-g Priori-Verteilung auf die verbliebenen Koeffizienten, welche die linearen Anteile der Kovariablen-Effekte parametrisieren, anwenden. Ein additives Modell ist dann definiert durch die (ganzzahligen) Freiheitsgrade aller Kovariablen-Effekte, wobei wir auch den Ausschluss von Kovariablen und exakt lineare Effekte zulassen. Für GLM verwenden wir den iterierten gewichteten Kleinste-Quadrate Algorithmus um ein lineares Modell zu erhalten, von dem wir dann die passende Struktur der Priori-Kovarianzmatrix für die Hyper-g Priori-Verteilung ableiten. Eine Simulationsstudie zeigt auf dass unser Verfahren konkurrenzfähig ist im Vergleich zu anderen Bayesianischen additiven Modellwahl-Verfahren. Wir verwenden es zur Schätzung des Diabetes-Risikos mittels logistischer Regression. Um Überlebenszeiten zu analysieren, erweitern wir die Hyper-g Priori-Verteilung auf Proportionale Hazards Regression. Als ersten Ansatz verwenden wir eine Poisson-Approximation der vollen Likelihood, die bereits von Cai und Betensky (2003) vorgeschlagen wurde. Wir beschreiben wie diese fehlerhafte Approximation mit Hilfe einer Erweiterung des Datensatzes korrigiert werden kann. Diese Methode hat den Nachteil dass der Datensatz quadratisch mit der Stichprobengrösse wächst. Der zweite Ansatz erhält die lineare Daten-Komplexität und basiert auf sogenannten Test-basierten Bayes Faktoren (TBF), die von Johnson (2005) vorgeschlagen wurden. Statt die marginalen Likelihood-Werte für die Original-Daten zu berechnen, werden sie hier für die (partiellen) Likelihood-Quotienten Teststatistiken (auch als Devianzen bezeichnet) berechnet. Wir erklären wieso die implizit angenommene Priori-Verteilung genau unserer verallgemeinerten g-Priori-Verteilung entspricht. Wir spezifizieren eine Priori-Verteilung für den Skalierungsfaktor g, was uns zu TBF-basierten Hyper-g Priori-Verteilungen führt. Bei der Entwicklung eines klinischen Vorhersage-Modells mit logistischer Regression beobachten wir eine gute Approximations- und Vorhersage-Genauigkeit unseres Ansatzes. Bei der Anwendung auf Cox-Regression erhalten wir ähnliche Ergebnisse wie mit der Poisson-Approximation. Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting posterior model probabilities via the marginal likelihoods. There is now a large literature on automatic and objective parameter priors, which unburden the statistician from eliciting manually the parameter priors for all models in the absence of substantive prior information. One important example is Zellner’s g-prior, which has become a favourite choice of prior in the Gaussian linear model, due to various favourable properties. Continuous mixtures of Zellner’s g-priors are obtained by assigning a hyperprior to the prior covariance factor g. These hyper-g priors avoid the user’s choice of g, which can be very influential in the statistical analysis, and allow for a closed form marginal likelihood for specific hyperpriors. In earlier work we used fractional polynomial (FP) transformations, which are an extension of classical polynomials, together with hyper-g priors, to perform variable and function selection in Gaussian models. For generalized linear models (GLMs), a natural candidate for a generalized g-prior is a mean-zero Gaussian prior on the regression coefficients, with the inverse expected Fisher information multiplied with g as the covariance matrix. The generalized hyper-g prior specifies an additional (arbitrary) hyperprior on the scaling factor g. We solve the main difficulty, the computation of the marginal likelihood, with an integrated Laplace approximation. This accurate approach allows to explore the model space with a Markov chain Monte Carlo (MCMC) based stochastic search, avoiding the simultaneous sampling of model parameters of varying dimensions and yielding a sample of promising models. Subsequently we sample model-specific parameters using a tuning-free Metropolis-Hastings algorithm. Splines are an attractive alternative to FPs, because they are more flexible. We represent the splines as mixed models, where the non-linear parts are parametrized by the random effects. After integrating them out, we can apply the hyper-g prior to the remaining coefficients that parametrize the linear parts of the covariate effects. Each additive model is defined by the collection of (integer) degrees of freedom for all covariates, where we also allow for exclusion and strictly linear inclusion of covariates. For GLMs, we use the the iteratively weighted least squares algorithm to obtain a linear model approximation, from which we then derive the appropriate form of the prior covariance matrix for the hyper-g prior. In a simulation study we find that our method performs competitively in comparison with several other Bayesian additive model selection procedures. We use the method to derive logistic regression models for estimating diabetes risk. In order to analyse survival data, we extend the hyper-g prior to proportional hazards regression. The first idea is to use a Poisson model approximation of the full likelihood, which was first proposed by Cai and Betensky (2003). We describe how it can be corrected, and obtain a data augmentation which has quadratic complexity in the sample size. The second idea retains linear complexity, and builds on so-called test-based Bayes factors (TBFs), which were proposed by Johnson (2005). Instead of computing the marginal likelihood for the original data, it essentially computes the marginal likelihood for the (partial) likelihood ratio test statistics (also called deviances). We explain that the prior which is implicit in this approximation is exactly our generalised g-prior, and assign a hyperprior to the scaling factor g, which leads to TBF-based hyper-g priors. For the development of a clinical prediction model with logistic regression, we observe good approximation accuracy and competitive performance in a bootstrap study. For a Cox regression application, we observe similar results as with the Poisson model approximation

    Likelihood and Bayesian inference: with applications in biology and medicine

    No full text
    This richly illustrated textbook covers modern statistical methods with applications in medicine, epidemiology and biology. Firstly, it discusses the importance of statistical models in applied quantitative research and the central role of the likelihood function, describing likelihood-based inference from a frequentist viewpoint, and exploring the properties of the maximum likelihood estimate, the score function, the likelihood ratio and the Wald statistic. In the second part of the book, likelihood is combined with prior information to perform Bayesian inference. Topics include Bayesian updating, conjugate and reference priors, Bayesian point and interval estimates, Bayesian asymptotics and empirical Bayes methods. It includes a separate chapter on modern numerical techniques for Bayesian inference, and also addresses advanced topics, such as model choice and prediction from frequentist and Bayesian perspectives. This revised edition of the book “Applied Statistical Inference” has been expanded to include new material on Markov models for time series analysis. It also features a comprehensive appendix covering the prerequisites in probability theory, matrix algebra, mathematical calculus, and numerical analysis, and each chapter is complemented by exercises. The text is primarily intended for graduate statistics and biostatistics students with an interest in applications

    Likelihood and Bayesian inference : with applications in biology and medicine

    Full text link
    This richly illustrated textbook covers modern statistical methods with applications in medicine, epidemiology and biology. Firstly, it discusses the importance of statistical models in applied quantitative research and the central role of the likelihood function, describing likelihood-based inference from a frequentist viewpoint, and exploring the properties of the maximum likelihood estimate, the score function, the likelihood ratio and the Wald statistic. In the second part of the book, likelihood is combined with prior information to perform Bayesian inference. Topics include Bayesian updating, conjugate and reference priors, Bayesian point and interval estimates, Bayesian asymptotics and empirical Bayes methods. It includes a separate chapter on modern numerical techniques for Bayesian inference, and also addresses advanced topics, such as model choice and prediction from frequentist and Bayesian perspectives. This revised edition of the book “Applied Statistical Inference” has been expanded to include new material on Markov models for time series analysis. It also features a comprehensive appendix covering the prerequisites in probability theory, matrix algebra, mathematical calculus, and numerical analysis, and each chapter is complemented by exercises. The text is primarily intended for graduate statistics and biostatistics students with an interest in applications

    Applied statistical inference : likelihood and Bayes

    Full text link

    Objective Bayesian model selection for Cox regression

    Full text link
    There is now a large literature on objective Bayesian model selection in the linear model based on the g-prior. The methodology has been recently extended to generalized linear models using test-based Bayes factors. In this paper, we show that test-based Bayes factors can also be applied to the Cox proportional hazards model. If the goal is to select a single model, then both the maximum a posteriori and the median probability model can be calculated. For clinical prediction of survival, we shrink the model-specific log hazard ratio estimates with subsequent calculation of the Breslow estimate of the cumulative baseline hazard function. A Bayesian model average can also be employed. We illustrate the proposed methodology with the analysis of survival data on primary biliary cirrhosis patients and the development of a clinical prediction model for future cardiovascular events based on data from the Second Manifestations of ARTerial disease (SMART) cohort study. Cross-validation is applied to compare the predictive performance with alternative model selection approaches based on Harrell's c-Index, the calibration slope and the integrated Brier score. Finally, a novel application of Bayesian variable selection to optimal conditional prediction via landmarking is described. Copyright © 2016 John Wiley & Sons, Ltd
    corecore